Skip to content

Add alerts for bounce notifications jobs#2119

Merged
stephencdaly merged 2 commits into
mainfrom
add-alerts-for-bounce-notifications-jobs
May 15, 2026
Merged

Add alerts for bounce notifications jobs#2119
stephencdaly merged 2 commits into
mainfrom
add-alerts-for-bounce-notifications-jobs

Conversation

@stephencdaly
Copy link
Copy Markdown
Contributor

What problem does this pull request solve?

Trello card: https://trello.com/c/WmCipXbC

Add alarms for:

  • If the daily job to check for bounced submission deliveries and email the group/org admins hasn't run for more than a day
  • If jobs in the bounce_notifications queue have failed

Both alerts will create a Zendesk ticket if they fire in the production environment.

Things to consider when reviewing

  • Ensure that you consider the wider context.
  • Does it work when run on your machine?
  • Is it clear what the code is doing?
  • Do the commit messages explain why the changes were made?
  • Are there all the unit tests needed?
  • Has all relevant documentation been updated?

Reminders

If you've made changes to the deployer role (files in modules/deployer-access):

  • Remember to run make <environment> forms/account apply on the relevant environments (dev, staging and/or prod)
  • Check the #govuk-forms-deployment-notifications Slack channel to ensure the apply-forms-terraform-<environment> pipelines have run successfully

Add an alarm that will fire if the ScheduleBounceNotificationsJob
does not run for over a day. If this job does not run, then bounce
notifications will not be sent out to group/org admins.

Alarm if there are 25 consecutive 1-hour periods with no runs of the
ScheduleBounceNotificationsJob. This allows for some flexibility in
the schedule in the case of downtime.

Alert at info level, meaning that we send an email to the infra
test email for dev/staging and to Zendesk for production.
Add alarm that creates a Zendesk ticket in the tech support queue if
there are failed executions for the bounce_notifications Solid Queue
queue for forms-runner.
@stephencdaly stephencdaly marked this pull request as ready for review May 15, 2026 09:05
Copilot AI review requested due to automatic review settings May 15, 2026 09:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CloudWatch alarms to improve operational monitoring of bounce-notification background processing in forms-runner, ensuring failures or missing daily scheduling are surfaced (and create Zendesk tickets in production).

Changes:

  • Added an alarm that fires if the daily bounce-notifications scheduling job hasn’t run within the expected window.
  • Added an alarm that fires when Solid Queue jobs in the bounce_notifications queue have failed and won’t be retried.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
infra/deployments/forms/health/alerts/schedule-bounce-notifications-job-not-run.tf Adds a “job not run” CloudWatch alarm for ScheduleBounceNotificationsJob using the existing Forms/Jobs Started metric pattern.
infra/deployments/forms/health/alerts/failed-bounce-notifications-job-executions.tf Adds a “failed job executions” CloudWatch alarm for the bounce_notifications Solid Queue queue using the existing FailedJobExecutions pattern.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@whi-tw whi-tw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM

@stephencdaly stephencdaly added this pull request to the merge queue May 15, 2026
Merged via the queue into main with commit 600bda1 May 15, 2026
22 checks passed
@stephencdaly stephencdaly deleted the add-alerts-for-bounce-notifications-jobs branch May 15, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants